JSON Interop
Python’s simplejson, in an apparent attempt to avoid Unicode issues, defaults to encoding all non-ASCII characters using JSON’s \uXXXX
syntax. Ironically, this causes problems with, of all languages, JavaScript:
$ js js> load('json.js') js> print("\u263A".toJSONString()); ":" js> print(unescape(encodeURIComponent("\u263A".toJSONString()))); "☺"
The second, rather unobvious combination, converts Unicode to utf-8 and produces the correct result. A workaround on the Python side would be:
$ python >>> import simplejson >>> simplejson.dumps("\u263A",ensure_ascii=False).encode('utf-8') '"\xe2\x80\x99"'
Update: bug 397215 has been opened on the SpiderMonkey shell, and a compile time switch is already available to handle UTF-8 correctly. See the comments for details